Goto

Collaborating Authors

 cybernetic & informatic


DAPLSR: Data Augmentation Partial Least Squares Regression Model via Manifold Optimization

Chen, Haoran, Liu, Jiapeng, Wang, Jiafan, Shi, Wenjun

arXiv.org Artificial Intelligence

Traditional Partial Least Squares Regression (PLSR) models frequently underperform when handling data characterized by uneven categories. To address the issue, this paper proposes a Data Augmentation Partial Least Squares Regression (DAPLSR) model via manifold optimization. The DAPLSR model introduces the Synthetic Minority Over-sampling Technique (SMOTE) to increase the number of samples and utilizes the Value Difference Metric (VDM) to select the nearest neighbor samples that closely resemble the original samples for generating synthetic samples. In solving the model, in order to obtain a more accurate numerical solution for PLSR, this paper proposes a manifold optimization method that uses the geometric properties of the constraint space to improve model degradation and optimization. Comprehensive experiments show that the proposed DAPLSR model achieves superior classification performance and outstanding evaluation metrics on various datasets, significantly outperforming existing methods.


Topic mining based on fine-tuning Sentence-BERT and LDA

Li, Jianheng, Chen, Lirong

arXiv.org Artificial Intelligence

Research background: With the continuous development of society, consumers pay more attention to the key information of product fine-grained attributes when shopping. Research purposes: This study will fine tune the Sentence-BERT word embedding model and LDA model, mine the subject characteristics in online reviews of goods, and show consumers the details of various aspects of goods. Research methods: First, the Sentence-BERT model was fine tuned in the field of e-commerce online reviews, and the online review text was converted into a word vector set with richer semantic information; Secondly, the vectorized word set is input into the LDA model for topic feature extraction; Finally, focus on the key functions of the product through keyword analysis under the theme. Results: This study compared this model with other word embedding models and LDA models, and compared it with common topic extraction methods. The theme consistency of this model is 0.5 higher than that of other models, which improves the accuracy of theme extraction


Assessing and Predicting Air Pollution in Asia: A Regional and Temporal Study (2018-2023)

Rahman, Anika, Khatun, Mst. Taskia

arXiv.org Artificial Intelligence

This study analyzes and predicts air pollution in Asia, focusing on PM 2.5 levels from 2018 to 2023 across five regions: Central, East, South, Southeast, and West Asia. South Asia emerged as the most polluted region, with Bangladesh, India, and Pakistan consistently having the highest PM 2.5 levels and death rates, especially in Nepal, Pakistan, and India. East Asia showed the lowest pollution levels. K-means clustering categorized countries into high, moderate, and low pollution groups. The ARIMA model effectively predicted 2023 PM 2.5 levels (MAE: 3.99, MSE: 33.80, RMSE: 5.81, R: 0.86). The findings emphasize the need for targeted interventions to address severe pollution and health risks in South Asia.


Leveraging Large Language Models For Optimized Item Categorization using UNSPSC Taxonomy

Singh, Anmolika, Diao, Yuhang

arXiv.org Artificial Intelligence

Effective item categorization is vital for businesses, enabling the transformation of unstructured datasets into organized categories that streamline inventory management. Despite its importance, item categorization remains highly subjective and lacks a uniform standard across industries and businesses. The United Nations Standard Products and Services Code (UNSPSC) provides a standardized system for cataloguing inventory, yet employing UNSPSC categorizations often demands significant manual effort. This paper investigates the deployment of Large Language Models (LLMs) to automate the classification of inventory data into UNSPSC codes based on Item Descriptions. We evaluate the accuracy and efficiency of LLMs in categorizing diverse datasets, exploring their language processing capabilities and their potential as a tool for standardizing inventory classification. Our findings reveal that LLMs can substantially diminish the manual labor involved in item categorization while maintaining high accuracy, offering a scalable solution for businesses striving to enhance their inventory management practices.


Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis

Akinwande, Mayowa, Adeliyi, Oluwaseyi, Yussuph, Toyyibat

arXiv.org Artificial Intelligence

This research explores the nuanced differences in texts produced by AI and those written by humans, aiming to elucidate how language is expressed differently by AI and humans. Through comprehensive statistical data analysis, the study investigates various linguistic traits, patterns of creativity, and potential biases inherent in human-written and AI- generated texts. The significance of this research lies in its contribution to understanding AI's creative capabilities and its impact on literature, communication, and societal frameworks. By examining a meticulously curated dataset comprising 500K essays spanning diverse topics and genres, generated by LLMs, or written by humans, the study uncovers the deeper layers of linguistic expression and provides insights into the cognitive processes underlying both AI and human-driven textual compositions. The analysis revealed that human-authored essays tend to have a higher total word count on average than AI-generated essays but have a shorter average word length compared to AI- generated essays, and while both groups exhibit high levels of fluency, the vocabulary diversity of Human authored content is higher than AI generated content. However, AI- generated essays show a slightly higher level of novelty, suggesting the potential for generating more original content through AI systems. The paper addresses challenges in assessing the language generation capabilities of AI models and emphasizes the importance of datasets that reflect the complexities of human-AI collaborative writing. Through systematic preprocessing and rigorous statistical analysis, this study offers valuable insights into the evolving landscape of AI-generated content and informs future developments in natural language processing (NLP).


Predicting Accident Severity: An Analysis Of Factors Affecting Accident Severity Using Random Forest Model

Adefabi, Adekunle, Olisah, Somtobe, Obunadike, Callistus, Oyetubo, Oluwatosin, Taiwo, Esther, Tella, Edward

arXiv.org Artificial Intelligence

Road accidents have significant economic and societal costs, with a small number of severe accidents accounting for a large portion of these costs. Predicting accident severity can help in the proactive approach to road safety by identifying potential unsafe road conditions and taking well-informed actions to reduce the number of severe accidents. This study investigates the effectiveness of the Random Forest machine learning algorithm for predicting the severity of an accident. The model is trained on a dataset of accident records from a large metropolitan area and evaluated using various metrics. Hyperparameters and feature selection are optimized to improve the model's performance. The results show that the Random Forest model is an effective tool for predicting accident severity with an accuracy of over 80%. The study also identifies the top six most important variables in the model, which include wind speed, pressure, humidity, visibility, clear conditions, and cloud cover. The fitted model has an Area Under the Curve of 80%, a recall of 79.2%, a precision of 97.1%, and an F1 score of 87.3%. These results suggest that the proposed model has higher performance in explaining the target variable, which is the accident severity class. Overall, the study provides evidence that the Random Forest model is a viable and reliable tool for predicting accident severity and can be used to help reduce the number of fatalities and injuries due to road accidents in the United States


A Review of the Ethics of Artificial Intelligence and its Applications in the United States

Taiwo, Esther, Akinsola, Ahmed, Tella, Edward, Makinde, Kolade, Akinwande, Mayowa

arXiv.org Artificial Intelligence

This study is focused on the ethics of Artificial Intelligence and its application in the United States, the paper highlights the impact AI has in every sector of the US economy and multiple facets of the technological space and the resultant effect on entities spanning businesses, government, academia, and civil society. There is a need for ethical considerations as these entities are beginning to depend on AI for delivering various crucial tasks, which immensely influence their operations, decision-making, and interactions with each other. The adoption of ethical principles, guidelines, and standards of work is therefore required throughout the entire process of AI development, deployment, and usage to ensure responsible and ethical AI practices. Our discussion explores eleven fundamental'ethical principles' structured as overarching themes. These encompass Transparency, Justice, Fairness, Equity, Non-Maleficence, Responsibility, Accountability, Privacy, Beneficence, Freedom, Autonomy, Trust, Dignity, Sustainability, and Solidarity. These principles collectively serve as a guiding framework, directing the ethical path for the responsible development, deployment, and utilization of artificial intelligence (AI) technologies across diverse sectors and entities within the United States. The paper also discusses the revolutionary impact of AI applications, such as Machine Learning, and explores various approaches used to implement AI ethics. This examination is crucial to address the growing concerns surrounding the inherent risks associated with the widespread use of artificial intelligence. NTRODUCTION Since the advent of artificial intelligence, various applications have been developed that have assisted human productivity and alleviated human effort, resulting in efficient time management. Artificial intelligence has aided businesses, healthcare, information technology, banking, transportation, and robots. The term "artificial intelligence" refers to reproducing human intelligence processes using machines, specifically computer systems[1].Artificial intelligence allows the United States of America to run more efficiently, produce cleaner products, reduce adverse environmental impacts, promote public safety, and improve human health. Until recently, conversations around "AI ethics" were limited to academic institutions and non-profit organizations.


clustering an african hairstyle dataset using pca and k-means

Nicrocia, Teffo Phomolo, Adewale, Owolawi Pius, Diana, Pholo Moanda

arXiv.org Artificial Intelligence

The adoption of digital transformation was not expressed in building an African face shape classifier. In this paper, an approach is presented that uses k-means to classify African women images. African women rely on beauty standards recommendations, personal preference, or the newest trends in hairstyles to decide on the appropriate hairstyle for them. In this paper, an approach is presented that uses K-means clustering to classify African women's images. In order to identify potential facial clusters, Haarcascade is used for feature-based training, and K-means clustering is applied for image classification.


Explaining the Performance of Collaborative Filtering Methods With Optimal Data Characteristics

Poudel, Samin, Bikdash, Marwan

arXiv.org Artificial Intelligence

The performance of a Collaborative Filtering (CF) method is based on the properties of a User-Item Rating Matrix (URM). And the properties or Rating Data Characteristics (RDC) of a URM are constantly changing. Recent studies significantly explained the variation in the performances of CF methods resulted due to the change in URM using six or more RDC. Here, we found that the significant proportion of variation in the performances of different CF techniques can be accounted to two RDC only. The two RDC are the number of ratings per user or Information per User (IpU) and the number of ratings per item or Information per Item (IpI). And the performances of CF algorithms are quadratic to IpU (or IpI) for a square URM. The findings of this study are based on seven well-established CF methods and three popular public recommender datasets: 1M MovieLens, 25M MovieLens, and Yahoo! Music Rating datasets


X-ReCoSa: Multi-Scale Context Aggregation For Multi-Turn Dialogue Generation

Wu, Danqin

arXiv.org Artificial Intelligence

In multi-turn dialogue generation, responses are not only related to the topic and background of the context but also related to words and phrases in the sentences of the context. However, currently widely used hierarchical dialog models solely rely on context representations from the utterance-level encoder, ignoring the sentence representations output by the word-level encoder. This inevitably results in a loss of information while decoding and generating. In this paper, we propose a new dialog model X-ReCoSa to tackle this problem which aggregates multi-scale context information for hierarchical dialog models. Specifically, we divide the generation decoder into upper and lower parts, namely the intention part and the generation part. Firstly, the intention part takes context representations as input to generate the intention of the response. Then the generation part generates words depending on sentence representations. Therefore, the hierarchical information has been fused into response generation. we conduct experiments on the English dataset DailyDialog. Experimental results exhibit that our method outperforms baseline models on both automatic metric-based and human-based evaluations.